Text-to-speech synthesis with arbitrary speaker's voice from average voice

نویسندگان

Masatsune Tamura

Takashi Masuko

Keiichi Tokuda

Takao Kobayashi

چکیده

This paper describes a technique for synthesizing speech with any desired voice. The technique is based on an HMM-based text-to-speech (TTS) system and MLLR adaptation algorithm. To generate speech of an arbitrarily given target speaker, speaker-independent speech units, i.e., average voice models, is adapted to the target speaker using MLLR framework. In addition to spectrum and pitch adaptation, we derive an algorithm for adaptation of state duration. We demonstrate that a few sentences uttered by a target speaker are sufficient to adapt not only voice characteristics but also prosodic features. Synthetic speech generated from adapted models using only four sentences is very close to that from speaker dependent models trained using a large amount of speech data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...

متن کامل

Spectral voice conversion for text-to-speech synthesis

A new voice conversion algorithm that modifies a source speaker's speech to sound as if produced by a target speaker is presented. It is applied to a residualexcited LPC text-to-speech diphone synthesizer. Spectral parameters are mapped using a locally linear transformation based on Gaussian mixture models whose parameters are trained by joint density estimation. The LPC residuals are adjusted ...

متن کامل

Average-Voice-Based Speech Synthesis

This thesis describes a novel speech synthesis framework " Average-Voice-based Speech Synthesis. " By using the speech synthesis framework, synthetic speech of arbitrary target speakers can be obtained robustly and steadily even if speech samples available for the target speaker are very small. This speech synthesis framework consists of speaker normalization algorithm for the parameter cluster...

متن کامل

GMM Classification of TTS Synthesis: Identification of Original Speaker's Voice

This paper describes two experiments. The first one deals with evaluation of synthetic speech quality by reverse identification of original speakers whose voices had been used for several Czech text-to-speech (TTS) systems. The second experiment was aimed at evaluation of the influence of voice transformation on the original speaker recognition. The paper further describes an analysis of the in...

متن کامل

Multimodal Speech Synthesis

The main goal of the speech synthesis group in SmartKom was to develop a natural sounding synthetic voice for the avatar " Smartakus " that is judged to be agreeable, intelligible, and friendly by the users of the SmartKom system. Two aspects of the SmartKom scenario facilitate the achievement of this goal. First, since speech output is mainly intended for the interaction of Smartakus with the ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

Text-to-speech synthesis with arbitrary speaker's voice from average voice

نویسندگان

چکیده

منابع مشابه

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

Spectral voice conversion for text-to-speech synthesis

Average-Voice-Based Speech Synthesis

GMM Classification of TTS Synthesis: Identification of Original Speaker's Voice

Multimodal Speech Synthesis

عنوان ژورنال:

اشتراک گذاری